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Java  Implementation  of  a  Single-Database 
Computationally  Symmetric  Private  Information 
Retrieval  (cSPIR)  protocol 


Felipe  Saint-Jean  * 


1  Motivation 


Picture  the  following  scenario.  Alice  is  looking  for  gold  in  California.  What  Alice 
does  is  look  for  a  place  with  a  little  gold  and  follow  the  trace.  Now,  Alice  wants  to 
find  gold  in  a  place  where  no  mining  patent  has  been  awarded,  but  many  patents  have 
been  awarded  in  California  during  the  gold  rush.  What  Alice  does  is  to  walk  around 
California  with  a  GPS  and  a  notebook  computer.  Whenever  she  finds  a  trace  of  gold 
she  follows  it  querying  if  any  patent  has  been  awarded  in  that  location.  If  she  finds  a 
trace  of  gold  in  a  piece  of  land  with  no  issued  patent  she  can  request  the  patent  and 
start  mining  for  gold. 

The  problem  is  that  she  is  worried  that  Bob’s  Mining  Patents  Inc.,  the  service  she 
queries  the  patents  from,  might  cheat  on  her.  Because  Bob’s  knows  she  is  looking  for 
gold  in  California  (Alice  said  so  when  signing  up  for  Bob’s  service),  he  knows  that,  if 
she  queries  from  some  location,  then  there  is  gold  there.  So,  if  she  queries  a  location 
and  there  is  no  patent  awarded,  Bob  may  run  to  the  patent  office  and  get  the  mining 
patent  for  that  location. 

Depending  on  privacy  and  economic  constraints,  a  few  solutions  come  to  mind.  Alice 
might  buy  from  Bob  the  whole  database  for  California.  Alice  then  can  make  all  the 
queries  to  her  own  database,  and  Bob  will  never  find  out  where  Alice  is  looking  for 
gold.  But  this  might  be  very  expensive,  because  Bob  charges  per  query;  what  he 
charges  for  the  whole  database  will  probably  be  more  than  what  Alice  is  willing  to 
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pay.  Alice  can  also,  with  each  query,  perform  a  collection  of  fake  queries  so  that  Bob 
can’t  figure  out  which  is  the  real  query  (this  leaks  information  unless  she  queries  the 
whole  database!),  but  that  still  makes  Alice  pay  for  more  queries  that  she  would  like. 

This  Alice-and-Bob  scenario  is  a  basic  motivation  for  Private  Information  Retrieval: 
a  family  of  two-party  protocols  in  which  one  of  the  parties  owns  a  database,  and  the 
other  wants  to  query  it  with  certain  privacy  restrictions  and  warranties.  Since  the 
PIR  problem  was  posed,  different  approaches  to  its  solution  have  been  pursued.  In 
the  following  sections,  we  will  present  the  general  ideas  of  the  variations  and  proposed 
solutions  to  the  PIR  problem.  Then,  we  will  present  a  collection  of  basic  protocols 
that  allow  the  implementation  of  a  general-purpose  PIR  protocol.  Finally,  we  will 
show  details  of  a  particular  PIR  protocol  we  have  implemented. 


2  Introduction 


As  mentioned  in  the  motivation,  the  goals  of  PIR  would  be  realized  if  Bob  were  to  send 
the  whole  database  to  Alice.  That  would  be  satisfactory  if  Bob  didn’t  care  whether 
Alice  learned  more  than  what  she  queried.  In  that  situation,  the  challenge  is  to  devise 
a  protocol  that  reduces  the  amount  of  data  Bob  has  to  send  in  order  for  Alice  to  learn 
the  answer  to  her  query  without  Bob’s  learning  what  the  query  was.  In  general,  that 
can  only  be  done  by  having  replicated,  non-communicating  databases. Going  back 
to  the  Alice-and-Bob  scenario,  there  is  not  one  Bob  but  a  collection  of  them  with 
identical  databases.  In  this  way,  the  query  can  be  hidden  if  Alice  interacts  with  all 
the  Bobs  in  such  a  way  that  each  Bob  is  never  sure  whether  the  query  he  receives  is  the 
real  one.  The  solutions  proposed  in  this  line  of  work  achieve  a  lower  number  of  queries 
(sublinear  in  the  database’s  size)  if  more  replicated  Bobs  are  available.  In  this  kind 
of  solution,  Alice’s  privacy  is  protected,  and  the  objective  is  to  reduce  the  number 
of  queries  needed.  Most  of  the  protocols  in  this  line  of  work  present  solutions  that 
are  private  from  an  information-theoretic  point  of  view.  For  example  Choret  al.  [6] 
show  that,  if  the  database  is  replicated  two  or  more  times,  sublinear  communication 
complexity  can  be  obtained.  Sublinear  communication  means  that,  in  the  execution  of 
the  protocol,  less  that  the  complete  database  is  transfered.  As  mentioned  above,  the 
whole  database  is  a  trivial  upper  bound  on  the  communication  of  the  PIR  problem. 
In  the  information-theoretic  approach,  the  protocols  are  composed  of  many  single- 
clement  queries,  each  taking  with  a  cost  of  1,  because  exactly  one  element  of  the 
database  is  transfered  in  each  response  to  each  query. 

An  important  improvement  in  PIR  was  put  forth  by  Kushilcvitz  and  Ostrovsky  [2]; 
they  presented  a  PIR  protocol  that  requires  no  replication.  Their  protocol,  based  on 
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the  hardness  of  the  Quadratic  Residuosity  problem,  is  private  from  a  computational 
complexity  point  of  view;  so,  to  distinguish  it  from  the  information-theoretical  ap¬ 
proach,  it  is  known  as  cPIR,  for  ’’computational  PIR”.  This  idea  was  first  considered 
in  [7].  One  variation  of  the  standard  PIR  scenario,  in  which  only  Alice’s  privacy  is 
safeguarded,  is  the  SPIR  scenario  (Symmetric  PIR).  In  SPIR,  we  not  only  care  about 
Bob’s  not  learning  anything  about  Alice’s  query,  but  we  also  want  Alice  not  to  learn 
anything  about  other  entries  in  Bob’s  database  other  than  the  one  she  queried.  This 
is  very  similar  to  the  one-out-of-N  Oblivious  Transfer  problem,  and,  as  we  will  see 
later  they  are  closely  related. 

In  this  work,  we  will  focus  on  the  implementation  of  a  specific  SPIR  protocol  proposed 
by  Naor  and  Pinkas  [3]  that  uses  Oblivious  Transfers  as  a  building  block.  One  of  the 
properties  of  this  protocol  is  that  it  requires  only  one  initialization  phase  for  a  sequence 
of  queries,  thus  amortizing  the  cost  of  the  initialization  phase.  We  will  also  show  a 
variation  of  the  protocol,  proposed  by  Boneh,  that  eliminates  the  initialization  phase 
by  introducing  a  cPIR  query  as  part  of  the  protocol. 


3  Description  of  protocols  and  other  tools 

In  this  sections,  we  present  a  collection  of  protocols  that  are  required  for  the  SPIR 
implementation.  In  all  of  them,  the  Sender  (Bob)  owns  a  database,  and  the  Receiver 
(Alice)  wants  to  get  the  z-th  value  in  this  database. 


3.1  One-out-of-two  Oblivious  transfer  OTf 

In  a  one-out-of-two  Oblivious  Transfer  (abbrev.  OTf) ,  the  Sender  holds  two  values. 
As  a  result  of  the  protocol,  the  Receiver  learns  a  value  of  his  choice  and  nothing  about 
the  other.  The  Sender  learns  nothing  about  the  choice  made  by  the  Receiver.  The 
implementation  of  OTf  that  we  used  is: 

Initialization:  The  Sender  and  Receiver  agree  on  a  large  prime  q  and  a  generator 
g  for  Z*.  In  the  actual  implementation,  the  Sender  generates  them  and  sends  them 
to  the  Receiver.  The  pair  ( q ,  g )  can  be  used  in  several  transfers.  That  is  because 
we  want  the  Receiver  not  to  be  able  to  compute  the  discrete  log  efficiently  and  no 
extra  information  that  enables  him  to  do  so  is  given  as  part  of  the  protocol.  H(.)  is 
a  random  oracle-  in  practice,  a  hash  function. 


3 


Receiver 


k  <—r  Z*  ;  Choose  k  random  in  Zq 
Let  PKa  =  gk  and  PKi_a  = 

Send  PK0  to  the  Sender 


Ma  =  H((gr°)k)  ©  Ma 


C 


Sender 

Sender  chooses  a  random  element  C 
in  Zq  and  sends  it  to  the  receiver 


PK\  =  -p^r-  ;  Sender  computes 

PI<i 

r0  *- r  z*q 
^ _  £* 

E0  =<  gro,  H(P Kq°)  ©  M0  >  ;  En¬ 
cryption  of  M0 

E{  =<  gri,H(PK{1)  ©  Mx  >  ;  En¬ 
cryption  of  Mi 

EqEi 


In  our  implementation,  the  random  oracle  H  is  implemented  as  a  Hash  function. 
Thus,  the  size  of  M,  has  to  be  smaller  than  the  size  of  the  output  of  H .  We  used 
sha  —  256;  so,  Mt  can  be  up  to  256  bits. 


3.2  One  out  of  N  Oblivious  Transfer  OT^ 

One-out-of-N  Oblivious  Transfer  (abbrev.  OT,A' )  is  a  generalization  of  OT f.  In  OTA  , 
the  Sender  holds  a  list  of  N  elements  instead  of  2.  Here  also  desired  properties  are 
that  the  Receiver  learn  only  the  i-th  value  in  the  database  and  that  the  Sender  learn 
nothing  about  i.  Our  implemented  OTf  protocol  is  the  following: 

Initialization:  The  Sender  holds  values  X { .  X2 . ...,  X^  with  Xt  e  {0,  l}m  and  N  = 
2l .  The  Receiver  wants  to  learn  Xt. 
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Receiver 


In  the  OTf  picks;  Kl-  to  learn  Xj. 

x,  =  Y,  ©  ®p,  F  (I) 

3 


Sender 

Prepares  l  =  \l0g2N]  random  pairs 
of  keys 

(Ki,  Kl),  (K$,  K\), ...,  (K?,  K}), 
where  each  Kj  is  a  f-bit  key  to  the 
pseudorandom  function  F^. 

For  all  1  <  I  <  N,  let 

(ii,  i2, ...,  ii)  be  the  bits  of  /;  com¬ 
pute  Y,  =  Xj  ©  ©5=1  F  tj  (/) 

3 

Sender  and  Receiver  engage  in  an 
OT'{  for  the  strings  <  A''1.  l\‘  > 
with  j  —  0, 1. 


N 


The  paper  by  Naor  and  Pinkas  that  proposed  this  protocol  [4]  is  ambiguous  in  defining 
the  pseudorandom  function  F  as  Fk  :  {0,  l}m  — >  {0,  l}m  but  using  it  as  Ffc(J);  where 
1  <  I  <  N]  for  our  implementation  we  needed  to  use  a  pseudorandom  function 
Fk  :  {0,  l}1  — >  {0,  l}m;  where  l  =  \log2N], 


In  relation  to  the  domain  of  this  OT^  implementation,  Kl‘  will  be  input  of  a  OT {,  so 
the  Xi  have  to  be  the  same  size  as  the  output  of  the  pseudorandom  function  used. 


3.3  PIR 

In  a  PIR  protocol,  the  Sender  holds  a  database  of  size  N.  The  Receiver  wants  the  i-th 
value  in  this  database.  As  a  result  of  the  protocol;  the  Receiver  must  learn  the  i-th 
entry  in  the  database,  but  the  Sender  must  learn  nothing  about  i.  In  a  general  PIR 
protocol,  by  ” learn  nothing”,  we  mean  that  a  computationally  unbounded  Sender  can 
learn  nothing  about  i.  That  means  privacy  is  preserved  from  an  information-theoretic 
point  of  view.  We  mention  this  kind  of  protocol  for  clarity,  but  it  is  not  used  in  our 
implementation.  There  is  no  restriction  on  what  the  Receiver  can  learn  as  a  result  of 
the  protocol. 
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3.4  cPIR 


A  cPIR  protocol  is  similar  to  a  PIR  protocol.  The  only  difference  is  that  privacy  is 
safeguarded  against  a  polynomially  bounded  Sender. 


3.5  Other  tools  and  protocols  required  in  the  way 

A  few  additional  cryptographic  techniques  will  be  needed  to  implement  the  SPIR 
protocol. 


Random  Oracle 

A  random  oracle  is  a  protocol-design  tool  that  gives  all  the  parties  in  the  protocol  a 
common  source  of  random  bits.  In  practice,  the  shared  randomness  is  provided  by  a 
cryptographically  strong  hash  function  like  SHA-1. 


Sum-consistent  synthesizer 

A  sum-consistent  synthesizer  is  a  function  S  for  which  the  following  holds: 

•  S  is  a  pseudo-random  synthesizer. 

•  For  every  A,  Y  and  X',  Y',  if  X  +  Y  =  X'  +  Y\  then  S(X,  Y)  =  S(X\  Y'). 

where  a  pseudo-random  synthesizer  is  basically  a  pseudorandom  function  on  many 
variables  that  is  pseudorandom  on  each  one  of  them.  Pseudo-random  synthesizers 
where  introduced  by  Naor  and  Reingold  in  [1], 


4  cSPIR  Implementation  details 


The  scenario  in  which  an  SPIR  protocol  is  used  is  similar  to  that  in  which  a  cPIR 
protocol  is  used.  However,  at  the  end  of  the  execution  of  an  SPIR  protocol,  the 
Receiver  should  have  learned  nothing  about  values  in  the  database  other  than  the 
i-th  one.  Note  that  SPIR  is  the  most  constrained  of  all  PIR  variations  and  that  an 
OTf  is  a  SPIR  protocol. 
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We  implemented  a  variation  of  version  3  of  the  protocol  presented  in  [3].  Version  3 
of  the  protocol  is  not  secure,  because,  after  several  queries  (3  to  be  exact)  it  leaks 
information.  The  authors  of  [3]  propose  a  high-cost  fix.  In  our  implementation,  we 
lower  the  cost  by  modifying  a  step  in  the  protocol.  The  original  protocol  as  described 
in  [3]  is 

Initialization:  The  Sender  prepares  2 \ZrN  random  keys  (i?i,  R2, ...,  -Ryyr,  Ci,  C2, Cn y). 
For  every  pair  1  <  i.  j  <  \fN ,  the  Sender  prepares  a  commitment  Yt]  of  Xl3, 

Yij  =  commit Ki:j{Xij).  It  sends  all  of  the  commitments  to  the  Receiver. 

If  the  Receiver  wants  to  learn  XlJ1  then  the  protocol  proceeds  as  follows: 


Receiver 


Receiver  gets  Rt  +  rR 


Receiver  gets  Rj  +  rc 
Kjj  =  S(Ri  +  rR ,  Cj  +  rc);  Receiver 
opens  the  commitment  Y%3  and  re¬ 
veals  Xij. 


Sender 

Choose  rc  random  in  the  same 
space  the  keys  R  and  C  are  chosen 
from.  Set  rR  =  —rc  so  rc  +  rR  =  0 
Sender  and  Receiver  engage  in  a 
OT/^  protocol  for  the  values  R,\  + 
rR,  R2  +  rR, ...,  Ryyf  +  rR 

Sender  and  Receiver  engage  in  a 
Or/^  protocol  for  the  values  Ci  + 

rc,  C2  +  rc---,  +  rc 


In  the  modified  version,  if  the  Receiver  wants  to  learn  XtJ,  then  the  protocol  proceeds 
as  follows: 
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Receiver 


The  Receiver  makes  a  PIR  query 
over  the  Y s  and  learns  YtJ 


Receiver  gets  Rt  +  rR 


Receiver  gets  Rj  +  rc 

Kjj  =  S(Ri  +  tr,  Cj  +  rc)  Receiver 

opens  the  commitment  Y^  to  obtain 


Sender 

Tffi 

Sender  prepares  2  y/N  random  keys 
(i?i,  R2 , R^w,  Ci,  C2, ...,  Cyy f). 

For  every  pair  1  <  i,  j  <  \fN,  the 
Sender  prepares  a  commitment  Yij 
of  Xij,  Y^  =  commit Kij(Xij) 


rc  r  and  rR  =  -rc;  so  rc  +  rR  = 

0 

Sender  and  Receiver  engage  in  an 
OT/^  protocol  for  the  values  R{  + 
rR,  R2  +  rR, R^  +  rR 

Sender  and  Receiver  engage  in  an 
OT/^  protocol  for  the  values  C1  + 
f'c,  C2  +  rc--,  C^  +  rc 


The  main  difference  is  that  the  original  protocol  requires  Q(n)  initialization  network 
traffic  to  send  the  commitments.  In  the  modified  version,  that  is  replaced  by  a  PIR 
query.  One  can  randomize  the  commitments  in  each  step  to  solve  the  information- 
leakage  problem.  The  price  of  this  is  O(N)  local  computation,  which  is  better  for 
overall  protocol  efficiency  than  Q(N)  communication. 


The  commit  function  was  implemented  with  the  symmetric  cryptosystem  AES ;  commit Rij  (Xlj) 
AES  —  ENCxi  (X^).  The  Sum-consistent  synthesizer  S  was  implemented  with  the 
hash  function  sha256 ,  B)  =  sha256(A  +  B ). 


5  Network  Layer 


All  of  these  protocols  involve  two-party  computations.  To  implement  them,  we  needed 
a  network  layer.  Because  the  implementation  was  done  in  Java,  a  natural  choice 
would  have  been  RMI.  The  problem  with  RMI  is  that  it  does  not  fit  well  with  the 
kind  of  message-driven  way  protocols  are  usually  described.  That  means  that  using 


RMI  forces  the  code  to  be  structured  differently  from  the  way  protocols  are  specified. 

That  doesn’t  seem  too  important  at  first  glance,  but  it  makes  a  big  difference  when 
checking  and  going  through  code.  It  is  much  easier  to  go  over  a  piece  of  code  that 
looks  like  the  protocol  written  in  the  original  paper.  For  that,  reason  our  network 
layer  was  implemented  by  means  of  messages.  Another  important  feature  of  the 
network  layer  is  that  it  requires  support  for  nested  calls.  The  SPIR  implementation 
is  built  upon  other  two-party  protocols;  so  we  needed  to  have  persistent  connection  in 
all  the  nested  protocol  calls.  For  a  simple  solution  that  combined  both  requirements, 
we  implemented  the  NetworkBroker  class.  The  NetworkBroker  class  is  a  symmetric 
class  written  over  TCP  sockets  that  allows  the  sending  of  serializable  objects  over 
a  network  in  a  way  that  is  very  natural  for  implementing  protocols  involving  two 
parties.  Here  is  a  short  code  sample  showing  an  object  message  passing  through  a 
pair  of  threads: 

public  void  testSimpleCase ()  throws  IOException,  ClassNotFoundExceptionf 
int  port  =  40000; 

final  NetObjectBrokerServer  server  =  new  NetObjectBrokerServer(port) ; 
final  Integer  sc  =  new  Integer (900) ; 
final  Integer  cs  =  new  Integer (901) ; 


Runnable  r  =  new  Runnable (){ 
public  void  run()  { 
try  { 

server . accept () ; 

System. out .println ("Got  a  connectoon") ; 
Integer  csl  =  (Integer) server  .waitObjectO  ; 
assertEquals (csl . intValue () , cs . intValue () ) ; 
server . sendOb j  ect (sc) ; 
server. close() ; 

}  catch  (Exception  e)  { 
e . printStackTrace () ; 
fail() ; 

> 

»; 

//  Server  is  waiting 
new  Thread (r) . start () ; 

System. out .println ("Here ! ! ") ; 

NetObjectBrokerClient  client  = 
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new  NetObjectBrokerClient ("localhost" ,port) ; 
System. out ,println( "Connected,  sending  obj"); 
client . sendObject (cs) ; 

Integer  scl=  (Integer) client .waitObject () ; 
assertEquals (scl . intValue () , sc . intValue () ) ; 
client . close () ; 


6  Conclusions  and  extensions 

The  main  objective  of  this  work  is  to  see  how  applicable  PIR  protocols  are  in  practice. 
A  typical  database-oriented  application  would  benefit  from  a  few  more  features  of  the 
query  engine.  Interesting  extensions,  from  that  point  of  view,  would  include  the 
ability  to  query  for  existence  of  an  entry  and  the  ability  to  query  for  string-valued 
or  non- sequential  keys.  That  can  be  easily  implemented  if  the  Sender  maintains  two 
tables.  More  ambitiously,  we  would  like  to  be  able  to  compute  joins  in  a  private  way 
without  NM  complexity,  where  N  and  M  are  the  sizes  of  the  joint  tables. 

An  OTi  is  an  SPIR  query.  It  has  the  same  privacy  properties.  It  would  be  interesting 
to  identify  the  cases  in  which  an  SPIR  query  is  more  efficient  than  an  OT,'v;  then  in 
many  places  where  a  OT,A  is  done,  it  might  be  replaced  by  a  recursive  SPIR  query. 

Many  optimizations  can  be  done.  One  of  them  is  to  replace  the  basic  implementation 
of  OTf  with  a  more  efficient  one. 
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